Specify asset path at top level in API #1700

jjnesbitt · 2023-10-05T18:59:38Z

Closes #1687, follow-up to #1689

This is a breaking API change.

Previously, for asset creation/update, the path of the asset was specified within the metadata field. This is confusing, as an asset's path is stored directly on the model itself (not in the metadata), and was the leading cause of #1682. This PR changes that behavior, by requiring an asset's path to be specified directly at the top level, alongside metadata, blob_id, and zarr_id.

waxlamp

Looks good overall, but I had one question about the user-facing interface changes.

waxlamp · 2023-10-06T13:20:55Z

dandiapi/api/views/asset.py

-        if 'path' not in data['metadata'] or not data['metadata']['path']:
-            raise serializers.ValidationError({'metadata': 'No path specified in metadata.'})
+        if 'path' in data['metadata']:
+            raise serializers.ValidationError(
+                {'metadata': 'Path must only be specified at top level.'}
+            )


Is this the part that makes this a breaking change?

Related question: it would be reasonable under the general usage of PUT endpoints to copy the metadata from a GET call and then paste it into the PUT call; if someone does that, will the presence of the path entry cause this error?

This is half of the breaking change yes, the other half is being required to specify the path at the top level (i.e. violating either of these will fail the request).

Related question: it would be reasonable under the general usage of PUT endpoints to copy the metadata from a GET call and then paste it into the PUT call; if someone does that, will the presence of the path entry cause this error?

Yes it will cause this error. If we'd like, we could still allow path in the metadata and just ignore it.

Assuming we'll disallow path: does this check belong in the serializer or the service layer? Since we would want to prevent it regardless of where it came from, I think it belongs in the service layer. And if that's the case, is path still a necessary element in ASSET_COMPUTED_FIELDS?

I think we'd still need to keep path as an entry in ASSET_COMPUTED_FIELDS, since it's still returned as a part of the computed metadata (and as such we'd want to prevent storing it in the metadata directly).

The validation error I added here is more a signal externally to prevent confusion / false bug reporting (e.g. "I changed path in the metadata, why does nothing change?"). If we assume we're keeping path as a field in ASSET_COMPUTED_FIELDS, I don't see what the benefit of performing a check in the service layer is, since the field will be stripped out anyway.

What do you think? It's possible I'm missing something.

FYI Jake and I discussed this and settled on generalizing the "path can't be specified as metadata" requirement to all reserved (computed) fields. As it stands, the same bug as #1682 exists for the other fields.

In this context computed means "computed from other properties of the asset or asset metadata". In the case of path, since it's stored directly on an asset, it's "computed" by just injecting that field into the metadata.

thanks @AlmightyYakob - i still find it weird that the API requires this as a separate element as opposed to being able to pull it out internally from the metadata as it used to. locally the CLI validates the asset with these fields inside, and hence seems easiest from a programmer standpoint using the API to have one point of entry. since this is a breaking API change, just want to use the opportunity to both reflect that dandi objects are the metadata and ensuring a good pattern for outside clients, which are growing beyond us, interacting with the API.

The intent of this change is to make it more clear the distinction between modifiable and un-modifiable metadata. The issue with the current state of things is as follows:

You have an asset and it's metadata, and you want to change a field in that asset's metadata. Depending on what field you want to modify, it will actually result in no change when the request is made to the API, since some fields are computed/derived/etc, and some are actually modifiable. Things like contentUrl and digest are examples of fields that you cannot modify directly. They are computed/derived from the asset, or its associated asset blob / zarr.

The path field is a unique example, where it's a field that is computed (since that data is stored on the asset itself and is then injected into the metadata), yet prior to this proposed change, was also specified in the metadata, and then stripped out by the server. This is counter-intuitive behavior, and is the reason I originally intended on barring path from being specified in the metadata with this PR. Dan and I also discussed extending this to all computed fields, so that there is a clear one-way data flow for any computed fields (top level -> metadata).

Leaving things the way they are now, how would a consumer of the API know which fields can be modified and which can't? The only way they could know (aside from reading the source code), is through trial and error, which seems undesirable from a user-facing point of view. If we were to update the API to make it explicit the fields which can and can't be modified, I think that would result in a better experience for consumers of the API.

I do see your point, in the sense that the format of the API and that of the Dandi schema would be diverging somewhat. However, regardless of the uniformity between the two formats, the dichotomy between metadata that can be updated, and computed metadata, still exists.

@AlmightyYakob - sorry for the delay.

The path field is a unique example, where it's a field that is computed (since that data is stored on the asset itself and is then injected into the metadata), yet prior to this proposed change, was also specified in the metadata, and then stripped out by the server.

regarding path, i don't know why it was being stripped out or inserted by the server. path is indeed special as it is the unique key to an asset. i.e. the same path cannot point to multiple blobs.

The only way they could know (aside from reading the source code), is through trial and error, which seems undesirable from a user-facing point of view.

indeed this is the case right now, and is really because of the API service not trusting clients to determine what those fields are. however, the exception is still for path - which is definitely a client prerogative to set, not for the server to decide. the API should have nothing to say about paths except regarding "our" policies on clashes with either POST/PUT.

can we better understand why path should be computed?

the dichotomy between metadata that can be updated, and computed metadata, still exists.

let's figure out a way to add this info to dandischema itself.

I also do not grasp why it is exactly computed, besides may be that it is first extracted from metadata for the asset record and then placed back into it (why?).

satra · 2023-10-06T13:32:46Z

it may really be helpful to document the lifecycle of an asset somewhere (can point to code pieces as needed). this should include what is injected asset metadata and when.

also if we are making API changes related to the asset, can we also address the blob/zarr vs blob_id/zarr_id difference between API returns and API POST/PUT endpoints and make them consistent.

jjnesbitt · 2023-10-06T16:23:05Z

it may really be helpful to document the lifecycle of an asset somewhere (can point to code pieces as needed). this should include what is injected asset metadata and when.

Good point, I agree.

also if we are making API changes related to the asset, can we also address the blob/zarr vs blob_id/zarr_id difference between API returns and API POST/PUT endpoints and make them consistent.

You're referring to #1686, correct? I think we can do that, but I'd prefer to do it in a separate PR, to keep things clean. We should delay the merging/deployment of these two PRs until the CLI has the corresponding changes ready to go.

yarikoptic · 2023-11-01T15:56:22Z

I feel that this a "good to have" and not really "has to have" change, and that is why not sure if it is worth the API breakage at this point. Should we may be better

establish "dandi-api-2.0" (or whatever we want to call it, e.g. dandi-api-0.2 to signify that it is not yet stable despite all the years; or dandi-api-1.0 - as the API we would approach as "finally a stable one to aim") project where we start collecting the issues/PRs which would require API breakage and bundle them all.
do get back to documenting "lifecycle" and formalize then how lifecycle would need to change in that "new" one

WDYT?

jjnesbitt · 2023-12-05T19:37:31Z

I'm closing this PR, and will be making a new PR containing the generalized solution discussed here. That will still constitute a breaking API change, but will be less disruptive, as it'll be contained in one PR.

jjnesbitt added 2 commits October 5, 2023 14:19

Specify asset path at top level in API

e1dae0d

Add test against API path regression

37a666d

jjnesbitt requested review from danlamanna and waxlamp October 5, 2023 18:59

waxlamp reviewed Oct 6, 2023

View reviewed changes

jjnesbitt closed this Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify asset path at top level in API #1700

Specify asset path at top level in API #1700

jjnesbitt commented Oct 5, 2023

waxlamp left a comment

waxlamp Oct 6, 2023

jjnesbitt Oct 6, 2023

danlamanna Oct 12, 2023

jjnesbitt Oct 12, 2023

danlamanna Oct 12, 2023

jjnesbitt Oct 13, 2023

satra Oct 13, 2023

jjnesbitt Oct 23, 2023 •

edited

Loading

satra Oct 30, 2023

yarikoptic Nov 1, 2023

satra commented Oct 6, 2023

jjnesbitt commented Oct 6, 2023

yarikoptic commented Nov 1, 2023

jjnesbitt commented Dec 5, 2023

Specify asset path at top level in API #1700

Specify asset path at top level in API #1700

Conversation

jjnesbitt commented Oct 5, 2023

waxlamp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjnesbitt Oct 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

satra commented Oct 6, 2023

jjnesbitt commented Oct 6, 2023

yarikoptic commented Nov 1, 2023

jjnesbitt commented Dec 5, 2023

jjnesbitt Oct 23, 2023 •

edited

Loading